The idea is to explore how spatial aggregation changes the spatial pattern. Apparently, geographers call this the Modifiable Areal Unit Problem, so it’s not a new idea. But, there is no agreed-upon solution to the MAUP other than perhaps not using spatial aggregation at all.
To start, I’m just going to work on the simplest case, where we have spatial point data and we want to see it spatially aggregated. By starting with points, it will be easy to do the aggregation into differently-sized spatial bins, without having to do any up-, down-, or side-scaling.
I was able to get two Stamen maps of LA loaded, one interactive (not shown) and one static
For my smallest spatial aggregation unit, I’m using Census tracts. Previously, I was using blocks, but they were way too small. Essentially, they are blocks like the blocks on a street.
The shapefile data is provided by the Census, so it’s reasonably easy to load the shapefiles and plot them over the base map.
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "LAtracts"
## with 2346 features and 12 fields
## Feature type: wkbPolygon with 2 dimensions
The first data I tried was scraped from yelp, and was all Korean restaurants listed in Los Angeles. Getting the data was a little tricky, and I’ll have to re-use it for something else, because this wasn’t the right application for it. But still.
To be able to plot the number of restaurants in each spatial area, we have to do a spatial search to see how many points are in each polygon.
Then, we join the data about how many restaurants there are in each area back toward the tracts shapefile.
Was having trouble getting it to plot, but it seems like my problem was getting the names to match.
For a spatial aggregation level above the tract size, I found data about the boundaries of LA neighborhoods.
## OGR data source with driver: ESRI Shapefile
## Source: "LANeighborhoodCouncils", layer: "NeighborhoodCouncils"
## with 95 features and 7 fields
## Feature type: wkbPolygon with 2 dimensions
Similar to the tracts, we can then calculate the number of restaurants within each polygon and then plot the spatial pattern.
Next we use zipcodes.
## OGR data source with driver: ESRI Shapefile
## Source: "CAMS_ZIPCODE_PARCEL_SPECIFIC", layer: "CAMS_ZIPCODE_PARCEL_SPECIFIC"
## with 311 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
I want to have data with more coverage than the Korean restaurants, so I’m going to use that streetlight data I got about LA a while back. It doesn’t make sense to plot absolute numbers (didn’t really make sense with the restaurants, either) so I need to calculate lights per person in the particular spatial area. This means I need some population data.
streetlights <- read.csv("../Streetlights/STLIGHT.csv")
head(streetlights)
## SLID STLID STATUS LASTPLAN POSTDESC LAMPA LAMPB
## 1 182945 D28985-24 AsBuilt LED5216 CD814E 82W Cree-C 80LED
## 2 182944 D28985-23 AsBuilt LED5216 CD814E 136W Leotek 60LED-E
## 3 103188 D3708-1 AsBuilt C1009 CD807 200W HPS
## 4 166647 P32489-1 AsBuilt LED5216 CD861 82W Cree-C 80LED
## 5 182946 D28985-25 AsBuilt LED5216 CD814E 136W Leotek 60LED-E
## 6 97579 D28985-22 AsBuilt LED5216 CD814E 82W Cree-C 80LED
## LAMPC LAMPD LAMPE LAMPF Latitude Longitude
## 1 -118.3 33.71
## 2 -118.3 33.71
## 3 -118.3 33.71
## 4 -118.3 33.71
## 5 -118.3 33.71
## 6 -118.3 33.71
rownames(streetlights) <- NULL
coordinates(streetlights) <-~Latitude+Longitude
proj4string(streetlights) <- CRS("+proj=longlat +datum=WGS84")
The population data came from the Census again (like most of the shapefiles).
We can plot both the absolute number of streetlights,
and the number of people to each streetlight. So, if the scale says 100 that means there are 100 people to a streetlight in that particular spatial area.
What did we learn? Well, streetlights are basically distributed according to population by zipcode. When I had the plotting reversed (streetlights per person) Universal City was really standing out, I assume because no one lives there.
For this data source, I worked from the largest spatial area to the smallest, so I need to go back and try this with smaller areas. But it’s seeming likely that this isn’t the right data for this problem either.
This task is pretty hard in R. Getting the shapefiles loaded, and then verifying they loaded properly, is a task in and of itself. Then, getting the data to aggregate properly is also pretty hard. It’s clear that some more abstractions would make this easier, for example if you could visually see the polygons and points on top of one another, demo on one spatial area what you wanted it to do, and have the system repeat the action n times where n is the number of polygons. This type of view would also let you more easily see what was getting aggregated.